Comparing Value-Function Estimation Algorithms in Undiscounted Problems
نویسنده
چکیده
We compare scaling properties of several value-function estimation algorithms. In particular, we prove that Q-learning can scale exponentially slowly with the number of states. We identify the reasons of the slow convergence and show that both TD( ) and learning with a xed learning-rate enjoy rather fast convergence, just like the model-based method.
منابع مشابه
A K - step look - ahead analysis of Value Iteration algorithms for Markov decision processes
We introduce and analyze a general look-ahead approach for Value Iteration Algorithms used in solving Lroth discounted and undiscounted Markov decision processes. This approach, based on the value-oriented concept interwoven with multiple adaptive relaxation factors, leads to accelcrating proccdures rvhich perform better than the separate use of either the concept of vaiue oriented or of relaxa...
متن کاملEstimation of LOS Rates for Target Tracking Problems using EKF and UKF Algorithms- a Comparative Study
One of the most important problem in target tracking is Line Of Sight (LOS) rate estimation for using from PN (proportional navigation) guidance law. This paper deals on estimation of position and LOS rates of target with respect to the pursuer from available noisy RF seeker and tracker measurements. Due to many important for exact estimation on tracking problems must target position and Line O...
متن کاملThe Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems
This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n -* oo for each initial state and each final reward vector. In addition, we obtain a character...
متن کاملRegular Policies in Abstract Dynamic Programming
We consider challenging dynamic programming models where the associated Bellman equation, and the value and policy iteration algorithms commonly exhibit complex and even pathological behavior. Our analysis is based on the new notion of regular policies. These are policies that are well-behaved with respect to value and policy iteration, and are patterned after proper policies, which are central...
متن کاملAffine Monotonic and Risk-Sensitive Models in Dynamic Programming
In this paper we consider a broad class of infinite horizon discrete-time optimal control models that involve a nonnegative cost function and an affine mapping in their dynamic programming equation. They include as special cases classical models such as stochastic undiscounted nonnegative cost problems, stochastic multiplicative cost problems, and risk-sensitive problems with exponential cost. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999